Geometric Bounds for Generalization in Boosting
نویسندگان
چکیده
We consider geometric conditions on a labeled data set which guarantee that boosting algorithms work well when linear classifiers are used as weak learners. We start by providing conditions on the error of the weak learner which guarantee that the empirical error of the composite classifier is small. We then focus on conditions required in order to insure that the linear weak learner itself achieves an error which is smaller than 1/2 − γ, where the advantage parameter γ is strictly positive and independent of the sample size. Such a condition guarantees that the generalization error of the boosted classifier decays to its minimal value at a rate of 1/ √ m, where m is the sample size. The required conditions, which are based solely on geometric concepts, can be easily verified for any data set in time O(m), and may serve as an indication for the effectiveness of linear classifiers as weak learners for a particular data set.
منابع مشابه
On the bounds in Poisson approximation for independent geometric distributed random variables
The main purpose of this note is to establish some bounds in Poisson approximation for row-wise arrays of independent geometric distributed random variables using the operator method. Some results related to random sums of independent geometric distributed random variables are also investigated.
متن کاملGeneralization Bounds for Convex Combinations of Kernel Functions
We derive new bounds on covering numbers for hypothesis classes generated by convex combinations of basis functions. These are useful in bounding the generalization performance of algorithms such as RBF-networks, boosting and a new class of linear programming machines similar to SV machines. We show that p-convex combinations with p > 1 lead to diverging bounds, whereas for p = 1 good bounds in...
متن کاملThe University of Chicago Algorithmic Stability and Ensemble-based Learning a Dissertation Submitted to the Faculty of the Division of the Physical Sciences in Candidacy for the Degree of Doctor of Philosophy Department of Computer Science by Samuel Kutin
We explore two themes in formal learning theory. We begin with a detailed, general study of the relationship between the generalization error and stability of learning algorithms. We then examine ensemble-based learning from the points of view of stability, decorrelation, and threshold complexity. A central problem of learning theory is bounding generalization error. Most such bounds have been ...
متن کاملGeneralization Error and Algorithmic Convergence of Median Boosting
We have recently proposed an extension of ADABOOST to regression that uses the median of the base regressors as the final regressor. In this paper we extend theoretical results obtained for ADABOOST to median boosting and to its localized variant. First, we extend recent results on efficient margin maximizing to show that the algorithm can converge to the maximum achievable margin within a pres...
متن کاملCombining Kernel Machines Through Decorrelation
Motivation: Combinations of classifiers have been found useful empirically, yet no formal proof exists about their generalization ability. Our goal is to develop a combination of kernel machines for which it is possible to prove generalization bounds. We believe that this is possible by further elaborating the arguments presented in [6], which may provide insights on boosting methods and view-b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001